Bayesian multistudy factor analysis for high-throughput biological data
نویسندگان
چکیده
This paper analyzes breast cancer gene expression across seven studies to identify genuine and thus replicable patterns shared among these studies. Our premise is that biological signal more likely be reproducibly present in multiple than spurious signal. analysis uses a new modeling strategy for the joint of high-throughput which simultaneously identifies as well study-specific To this end, we generalize multi-study factor model handle high-dimensional data sparse Bayesian infinite context. We provide strategies identification loading matrices, common study-specific. Through extensive simulation analysis, characterize performance proposed approach various scenarios show it outperforms standard identifying all considered. The clear patterns. These are related well-known pathways involved cancer, such ER, cell cycle, immune system, collagen, metabolic pathways. Some also associated with existing subtypes, LumA, Her2, basal while other novel active subtypes missed by hierarchical clustering approaches. R package MSFA implementing method available on GitHub.
منابع مشابه
Novel Bioinformatics Approaches for Analysis of High-Throughput Biological Data
1Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan 2Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan 3Institute of Systems Biology and Bioinformatics, National Central University, Taoyuan 320, Taiwan 4Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan 5Graduate Institute of...
متن کاملStatistical Methods for High-Throughput Biological Data
The explosion in DNA microarray technology in the last decade has given rise to extensive biological data in the form of expression profiles of tens of thousands of genes and proteins, often from only a handful of tissue samples. The principal objective of a high-throughput experiment can be generally characterized as one of class comparison, class prediction or molecular pattern discovery. Cla...
متن کاملPathway analysis of high-throughput biological data within a Bayesian network framework
MOTIVATION Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to cau...
متن کاملIntegrative Modeling and Analysis of High-throughput Biological Data
Computational biology is an interdisplinary field that focuses on developing mathematical models and algorithms to interpret biological data so as to understand biological problems. With current high-throughput technology development, different types of biological data can be measured in a large scale, which calls for more sophisticated computational methods to analyze and interpret the data. I...
متن کاملComputational Methods for Learning Bayesian Networks from High-Throughput Biological Data
Data from high-throughput technologies, such as gene expression microarrays, promise to yield insight into the nature of the cellular processes that have been disrupted by disease, thus improving our understanding of the disease and hastening the discovery of effective new treatments. Most of the analysis thus far has focused on identifying differential measurements, which form the basis of bio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Annals of Applied Statistics
سال: 2021
ISSN: ['1941-7330', '1932-6157']
DOI: https://doi.org/10.1214/21-aoas1456